Interlingual Annotation for MT Development

نویسندگان

  • Florence Reeder
  • Bonnie J. Dorr
  • David Farwell
  • Nizar Habash
  • Stephen Helmreich
  • Eduard H. Hovy
  • Lori S. Levin
  • Teruko Mitamura
  • Keith J. Miller
  • Owen Rambow
  • Advaith Siddharthan
چکیده

MT systems that use only superficial representations, including the current generation of statistical MT systems, have been successful and useful. However, they will experience a plateau in quality, much like other “silver bullet” approaches to MT. We pursue work on the development of interlingual representations for use in symbolic or hybrid MT systems. In this paper, we describe the creation of an interlingua and the development of a corpus of semantically annotated text, to be validated in six languages and evaluated in several ways. We have established a distributed, well-functioning research methodology, designed a preliminary interlingua notation, created annotation manuals and tools, developed a test collection in six languages with associated English translations, annotated some 150 translations, and designed and applied various annotation metrics. We describe the data sets being annotated and the interlingual (IL) representation language which uses two ontologies and a systematic theta-role list. We present the annotation tools built and outline the annotation process. Following this, we describe our evaluation methodology and conclude with a summary of issues that have arisen.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Keynote Address Some notes on the state of the art: Where are we now in MT: what works and what doesn’t? And the role of MT as an international collaborative activity

The paper examines briefly the impact of the “statistical turn” in machine translation (MT) R&D in the last decade, and particularly the way in which it has made large scale language resources (lexicons, text corpora etc.) more important than ever before and reinforced the role of evaluation in the development of the field. But resources mean, almost by definition, co-operation between groups a...

متن کامل

Interlingual Annotation of Parallel Text Corpora: A New Framework for Annotation and Evaluation

This paper focuses on the next step in the creation of a system of meaning representation and the development of semantically-annotated parallel corpora, for use in applications such as machine translation, question answering, text summarization, and information retrieval. The work described below constitutes the first effort of any kind to provide parallel corpora annotated with detailed deep ...

متن کامل

The Automatic Creation of Lexical Entries for a Multilingual MT System

In this paper, we describe a method of extracting information from an on-line resource for the consmaction of lexical entries for a multi-lingual, interlingual MT system (ULTRA). We have been able to automatically generate lexical entries for interlingual concepts corresponding to nouns, verbs, adjectives and adverbs. Although several features of these entries continue to be supplied manually w...

متن کامل

A Multi - Level Approach to Interlingual MT : De ningthe Interface between Representational LanguagesBonnie

This paper describes a multi-level design, i.e., a non-uniform approach to interlingual machine translation (MT), in which distinct representational languages are used for diierent types of knowledge. We demonstrate that a linguistically-motivated \division of labor" across multiple representation levels has not complicated, but rather has readily facilitated, the identiication and construction...

متن کامل

Handling Translation Divergences: Combining Statistical and Symbolic Techniques in Generation-Heavy Machine Translation

This paper describes a novel approach to handling translation divergences in a Generation-Heavy Hybrid Machine Translation (GHMT) system. The translation divergence problem is usually reserved for Transfer and Interlingual MT because it requires a large combination of complex lexical and structural mappings. A major requirement of these approaches is the accessibility of large amounts of explic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004